On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training

نویسندگان

  • William Chan
  • Ian Lane
چکیده

In this paper, we explore the use of attention-based models for online speech recognition without the usage of language models or searching. Our model is based on an attention-based neural network which directly emits English/Mandarin characters as outputs. The model jointly learns the pronunciation, acoustic and language model. We evaluate the model for online speech recognition on English and Mandarin. On English, we achieve a 33.0% WER on the WSJ task, or a 5.4% absolute reduction in WER compared to an online CTC based system. We also introduce a new training method and show how we can learn joint Mandarin Character-Pinyin models. Our Mandarin character only model achieves a 72% CER on the GALE Phase 2 evaluation, and with our joint Mandarin Character-Pinyin model, we achieve 59.3% CER or 12.7% absolute improvement over the character only model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic modeling in large vocabulary Mandarin speech recognition

The issue of incorporating prosodic information into speech recognition processes has emerged in recent years. In this work we present a complete framework for Mandarin speech recognition with prosodic modeling considering two-level hierarchical prosodic information for Mandarin Chinese. We developed a GMM-based, a decision-tree-based, and a hybrid approach. The best improvements in character r...

متن کامل

Linear Reranking Model for Chinese Pinyin-to-Character Conversion

Pinyin-to-character conversion is an important task for Chinese natural language processing tasks. Previous work mainly focused on n-gram language models and machine learning approaches, or with additional hand-crafted or automatic rule-based post-processing. There are two problems unable to solve for word n-gram language model: out-of-vocabulary word recognition and long-distance grammatical c...

متن کامل

N-Best Re-scoring Approaches for Mandarin Speech Recognition

The predominant language model for speech recognition is n-gram language model, which is locally learned and usually lacks global linguistic information such as long-distance syntactic constraints. We first explore two n-best re-scoring approaches for Mandarin speech recognition to overcome this problem. The first approach is linear re-scoring that can combine several language models from vario...

متن کامل

An Empirical Study of Word Error Minimization Approaches for Mandarin Large Vocabulary Continuous Speech Recognition

This paper presents an empirical study of word error minimization approaches for Mandarin large vocabulary continuous speech recognition (LVCSR). First, the minimum phone error (MPE) criterion, which is one of the most popular discriminative training criteria, is extensively investigated for both acoustic model training and adaptation in a Mandarin LVCSR system. Second, the word error minimizat...

متن کامل

The Effect of Orthography on L2 Perception

Mandarin Chinese has two orthographic systems: Chinese characters and Pinyin. While Pinyin is transparent to Mandarin pronunciation, characters are opaque and seldom relate to sound. This study aims to find out the effect of these two systems on Cantonese listeners who are L2 learners of Mandarin. Native Hong Kong Cantonese speakers participated in word recognition experiments which included a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016